-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ast/opa parse] Support marshalling of all ast location data #5576
[ast/opa parse] Support marshalling of all ast location data #5576
Conversation
✅ Deploy Preview for openpolicyagent ready!
To edit notification comments on pull requests, go to your Netlify site settings. |
I have some ideas about how to solve the mutation here and am still working on it. |
✅ Deploy Preview for openpolicyagent ready!
To edit notification comments on pull requests, go to your Netlify site settings. |
data["schemas"] = a.Schemas | ||
} | ||
|
||
if len(a.Custom) > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are many checks here to preserve the omit empty functionality.
// data.foo at foo.rego:5 has annotations {"scope":"subpackages","organizations":["Acme Corp."]} | ||
// data.foo.bar at mod:3 has annotations {"scope":"package","description":"A couple of useful rules"} | ||
// data.foo at foo.rego:5 has annotations {"organizations":["Acme Corp."],"scope":"subpackages"} | ||
// data.foo.bar at mod:3 has annotations {"description":"A couple of useful rules","scope":"package"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the new marshalling functions, some of the field orderings have changed.
ast/parser.go
Outdated
@@ -176,6 +179,12 @@ func (p *Parser) WithSkipRules(skip bool) *Parser { | |||
return p | |||
} | |||
|
|||
// WithJSONFields sets the JSON fields that should be exposed in the JSON | |||
func (p *Parser) WithJSONFields(jsonFields map[string]bool) *Parser { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if JSONFields would be better as a custom options type. e.g. ASTJSONMarshalOptions
which then each AST struct could read from to determine how to marshal itself...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm.. as opposed to this being global and applied to all nodes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could have been clearer here. I was considering changing the type of the jsonFields
from map[string]bool
to a custom struct for this use case, such as ASTJSONMarshalOptions
.
ASTJSONMarshalOptions
could wrap settings for global options and individual types. E.g. allow the changing of the json marshalling behaviour for a particular type only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm... what would be an example use-case where we'd want to change the marshalling? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sometimes we might want to have more control over which fields on which types are enabled / disabled.
For example, we might want to turn on locations for every
only. This would be marginally easier to retrofit with a custom struct rather than a map[string]bool.
map[string]bool{
"every_location": true,
}
ASTJSONMarshalOptions{
EveryLocations: true,
}
Mostly because it'd reduce the number of presence checks in the map and make it easier to make targeted adjustments to certain types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replacing the map
in JSONFields map[string]bool
with a custom struct would give us the flexibility to expand the feature. We can always start with the global options and then later add in more fine-grained controls.
Another thought is do we need a map of node field type
-> expose or not
. We can simply not expose by default (ie. current behavior) and then only do so when explicitly set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok thanks. I think that makes sense to me too. I'll have a go at this now.
do we need a map of node field type -> expose or not.
Perhaps it makes sense to do this now as part of an options wrapper which might grow in future. I'll see what it looks like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return nil | ||
} | ||
|
||
func unmarshalLocation(loc *Location, v map[string]interface{}) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added this since I needed to expand the functionality of term unmarshalling now that a location might have been included in the output data. I'm not sure if this is correct, perhaps it should not be possible to unmarshal and get the location data back in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the only thing that this could affect is where error locations are reported for example, then I don't think it'll be an issue to leave in. If anyone else sees an issue though, I'll defer to them on this.
Thanks all for the first round of reviews, I've reworked this following a suggestion from Ash which I think is a big improvement. Let me know what you think :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@charlieegan3 this looks good. Few questions to clarify my understanding of the changes.
Term *Term | ||
ExpectedJSON string | ||
}{ | ||
"base case": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the base case
and location excluded
cases are the same ie by default we won't include location.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's correct 🙂 This should be the case for all types.
ast/marshal_test.go
Outdated
return &Import{ | ||
Path: &term, | ||
Location: NewLocation([]byte{}, "example.rego", 1, 2), | ||
jsonFields: map[string]bool{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: The other tests do not include the jsonFields
field in the output which is fine I guess.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's a good spot. It's not needed and is inconsistent. I have removed it in 8e2fe4e
ast/parser.go
Outdated
@@ -176,6 +179,12 @@ func (p *Parser) WithSkipRules(skip bool) *Parser { | |||
return p | |||
} | |||
|
|||
// WithJSONFields sets the JSON fields that should be exposed in the JSON | |||
func (p *Parser) WithJSONFields(jsonFields map[string]bool) *Parser { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm.. as opposed to this being global and applied to all nodes?
ast/policy.go
Outdated
} | ||
|
||
// TODO: Comment Location is also always included for legacy reasons. jsonFields data is ignored. | ||
if c.Location != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Is this check needed if the location is also always included ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, that's a good point. In order to leave this functionality as it is I have changed things around in ee24358.
I think it's clear enough to note this inconsistency on the struct and leave out the no-op custom marshal function for now. I have left the tests in however.
cmd/parse.go
Outdated
@@ -77,5 +105,7 @@ func parse(args []string, stdout io.Writer, stderr io.Writer) int { | |||
|
|||
func init() { | |||
parseCommand.Flags().VarP(parseParams.format, "format", "f", "set output format") | |||
parseCommand.Flags().StringVarP(&parseParams.jsonInclude, "json-include", "", "", "select optional elements, current options: locations, comments. E.g. --json-include locations,-comments will include locations and exclude comments.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious why comments
was included as a --json-include
option. Is it because it's easy to remove from the parsed module or something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good question. I added it for two reasons:
- Comments use an inconsistent JSON key name where the fields are capitalised for legacy reasons. For this reason, a use of the command might want to drop them if they are interested in walking the JSON and processing it in an automated way.
- I wanted to include an additional option beyond locations for this flag so that it was clear how it was to be used.
Clearly the latter reason isn't very strong and I'd be more than happy to remove this functionality. I just thought it make it clearer in the PR how that flag was used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: @charlieegan3, I'd suggest waiting to merge until @ashutosh-narkar has had a chance to review this, but I'm marking approve because this looks solid to me.
LGTM! 🙂 I don't see anything wild or terribly unexpected here, although I have 1-2 small curiosity questions. In particular, I'd like to see whether or not we can get the Comment/comment
spelling inconsistency resolved eventually! 😅
} | ||
} | ||
|
||
// TODO: Comment has inconsistent JSON field names starting with an upper case letter. Comment Location is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[question]: Possibly a silly question, but is there a way for us to signal that the "legacy" capitalization is being deprecated, so that we can eventually change the field naming to the correct version?
Feel free to ignore if this is out-of-scope for this PR!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how to do that in a good way, perhaps we could log to stderr but I'm not sure tbh.
ast/parser.go
Outdated
@@ -176,6 +179,12 @@ func (p *Parser) WithSkipRules(skip bool) *Parser { | |||
return p | |||
} | |||
|
|||
// WithJSONFields sets the JSON fields that should be exposed in the JSON | |||
func (p *Parser) WithJSONFields(jsonFields map[string]bool) *Parser { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm... what would be an example use-case where we'd want to change the marshalling? 🤔
if generatedRaw, ok := v["generated"]; ok { | ||
if b, ok := generatedRaw.(bool); ok { | ||
expr.Generated = b | ||
} else { | ||
return fmt.Errorf("ast: unable to unmarshal generated field with type: %T (expected true or false)", v["generated"]) | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[question/curiosity]: What motivated this case? 😃 I'm always happy to see good defensive programming land in core utility functions, but this case is not entirely obvious from its surroundings. Is it caused by generated rules from ast/compiler
or somesuch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding on what generated: true means is elementary. However, I think it's that during compilation, some terms are rewritten and replaced with generated terms, perhaps to 'de-sugar' or some other reason.
Why this is here is that in order to round trip marshal and unmarshal an expression, this unmarshal logic was missing. It's not really related to the main functionality of this PR.
return nil | ||
} | ||
|
||
func unmarshalLocation(loc *Location, v map[string]interface{}) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the only thing that this could affect is where error locations are reported for example, then I don't think it'll be an issue to leave in. If anyone else sees an issue though, I'll defer to them on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@charlieegan3 this is getting close 😄 Just had few comments around how we define the fields to expose.
ast/marshal_test.go
Outdated
// text is not marshalled to JSON so we just drop it in our examples | ||
rule.Location.Text = nil | ||
|
||
testCases := map[string]struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I understand the difference between the base
and location
case.
Also in general can you please help me with what are you testing with the Test*_UnmarshalJSON
tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I understand the difference between the base and location case.
The idea here was to show that by default, the locations were not included AND that they were not included when the flag was set to false explicitly. It is perhaps unnecessary!
Also in general can you please help me with what are you testing with the Test*_UnmarshalJSON tests?
Before this PR, a number of types didn't support the unmarshalling of location data found in their JSON representations. As the PR evolved, this become only relevant for the Term and Expr types. However the tests remain for all. We could strip out the unmarshal tests for all types without custom unmarshalling. See the helper unmarshalLocation
in the diff for where this is used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ast/parser.go
Outdated
@@ -176,6 +179,12 @@ func (p *Parser) WithSkipRules(skip bool) *Parser { | |||
return p | |||
} | |||
|
|||
// WithJSONFields sets the JSON fields that should be exposed in the JSON | |||
func (p *Parser) WithJSONFields(jsonFields map[string]bool) *Parser { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replacing the map
in JSONFields map[string]bool
with a custom struct would give us the flexibility to expand the feature. We can always start with the global options and then later add in more fine-grained controls.
Another thought is do we need a map of node field type
-> expose or not
. We can simply not expose by default (ie. current behavior) and then only do so when explicitly set.
I've rebased this again today. I'm pretty happy with the implementation now, would you be able to have another quick look @ashutosh-narkar? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM @charlieegan3! Can you please add a test in cmd/parse_test.go
?
@@ -444,6 +455,14 @@ func (term *Term) UnmarshalJSON(bs []byte) error { | |||
return err | |||
} | |||
term.Value = val | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: We should update the func desc to something like Specialized unmarshalling is required to handle Value and location
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good spot, I've added this in fea4dbf
Thanks for the review Ash, I've made some changes to the parse command tests in 2c571c2. I had to make some edits to the parse function to allow the options to be passed in. Hope that all looks good now 🤞 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is awesome! And much awaited 💯 🚀 👏
One thing I'm curious about, and I don't see any references to it in the PR: the rego.parse_module built-in function. It's an outlier in many ways, and I am most likely its only user 😄 From my understanding of the code here, the output of that built-in function will not change with this. Correct? It'd be useful if that would emit location data too, but I'm not sure if we can have it changed. And we can't really change the function signature to allow for configuration options either. Might have to build a custom function for my needs, which is fine :)
OTOH, I'm not sure that adding data to its AST output would be considered breaking..
"value": { | ||
"location": { | ||
"file": "TEMPDIR/x.rego", | ||
"row": 3, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's interesting that we get the location of the value, but not of the var. I guess it's easy to deduce that it'll be on the row 0? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed 😞 The reason for this is that Var is 'just' a string and has no other internal state so we'd have no where to store the location data or JSON options to render it. I'm not sure there is a lot we can do here without making some pretty major changes to the way the ast package works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tsandall any ideas?
Yeah that function doesn't use any options at the moment. The default in this PR is to leave the json marshalling as it is but alter the options available. In the
Given this, I think it might be best to use a custom function for our use case 😉 |
Part of: #3143 This is being done so that the json representation of an ast parse can be used more easily by automated tools which need the location information. This location information is verbose for human users of the data and has until now been excluded from JSON representation. This PR makes it possible to configure the JSON marshalling with some options. However, the functionality of the OPA parse command has been left mostly unchanged. The only with the json output is that the fields are in a different order: ``` diff --git a/new.json b/old.json index 5b9504fa..0fb9d4f2 100644 --- a/new.json +++ b/old.json @@ -13,15 +13,6 @@ }, "rules": [ { - "body": [ - { - "index": 0, - "terms": { - "type": "boolean", - "value": true - } - } - ], "head": { "name": "allow", "value": { @@ -34,17 +25,26 @@ "value": "allow" } ] - } + }, + "body": [ + { + "terms": { + "type": "boolean", + "value": true + }, + "index": 0 + } + ] } ], "comments": [ { + "Text": "IGNvbW1lbnQ=", "Location": { "file": "main.rego", "row": 3, "col": 1 - }, - "Text": "IGNvbW1lbnQ=" + } } ] } ``` It's also possible to now show locations for all terms with another flag option. ``` $ go run main.go parse main.rego --format=json --json-include=locations { "package": { "location": { "file": "main.rego", "row": 1, "col": 1 }, "path": [ { "location": { "file": "main.rego", "row": 1, "col": 9 }, "type": "var", "value": "data" }, { "location": { "file": "main.rego", "row": 1, "col": 9 }, "type": "string", "value": "foo" } ] }, "rules": [ { "body": [ { "index": 0, "terms": { "type": "boolean", "value": true } } ], "head": { "name": "allow", "value": { "location": { "file": "main.rego", "row": 5, "col": 9 }, "type": "boolean", "value": true }, "ref": [ { "type": "var", "value": "allow" } ] } } ], "comments": [ { "Location": { "file": "main.rego", "row": 3, "col": 1 }, "Text": "IGNvbW1lbnQ=" } ] } ``` Signed-off-by: Charlie Egan <charlie@styra.com>
Signed-off-by: Charlie Egan <charlie@styra.com>
Signed-off-by: Charlie Egan <charlie@styra.com>
This change adds a new struct JSONOptions which allows the JSON marshalling and unmarshalling of ast nodes to be further customized in the future. It's my hope that this will be more extendable than the simple map we were using before. Signed-off-by: Charlie Egan <charlie@styra.com>
Before this PR, a number of types didn't support the unmarshalling of location data found in their JSON representations. As the PR evolved, this become only relevant for the Term and Expr types. Other unmarshal tests are removed in this commit. Signed-off-by: Charlie Egan <charlie@styra.com>
This adds tests for: * pretty output * json output * json location and comment flags This required the parse function to be adjusted to make the options configurable in test cases. Signed-off-by: Charlie Egan <charlie@styra.com>
Signed-off-by: Charlie Egan <charlie@styra.com>
Signed-off-by: Charlie Egan <charlie@styra.com>
Fixes #3143
This is being done so that the json representation of an ast parse can
be used more easily by automated tools which need the location
information. This location information is verbose for human users of the
data and has until now been excluded from JSON representation.
This PR makes it possible to configure the JSON marshalling with some
options. However, the functionality of the OPA parse command has been
left mostly unchanged.
The only with the json output is that the fields are in a different
order:
It's also possible to now show locations for all terms with another flag
option.
Reviewer Notes:
json-include